1,340 research outputs found

    MAGAN: Margin Adaptation for Generative Adversarial Networks

    Full text link
    We propose the Margin Adaptation for Generative Adversarial Networks (MAGANs) algorithm, a novel training procedure for GANs to improve stability and performance by using an adaptive hinge loss function. We estimate the appropriate hinge loss margin with the expected energy of the target distribution, and derive principled criteria for when to update the margin. We prove that our method converges to its global optimum under certain assumptions. Evaluated on the task of unsupervised image generation, the proposed training procedure is simple yet robust on a diverse set of data, and achieves qualitative and quantitative improvements compared to the state-of-the-art

    Unsupervised Hyperbolic Representation Learning via Message Passing Auto-Encoders

    Get PDF
    Most of the existing literature regarding hyperbolic embedding concentrate upon supervised learning, whereas the use of unsupervised hyperbolic embedding is less well explored. In this paper, we analyze how unsupervised tasks can benefit from learned representations in hyperbolic space. To explore how well the hierarchical structure of unlabeled data can be represented in hyperbolic spaces, we design a novel hyperbolic message passing auto-encoder whose overall auto-encoding is performed in hyperbolic space. The proposed model conducts auto-encoding the networks via fully utilizing hyperbolic geometry in message passing. Through extensive quantitative and qualitative analyses, we validate the properties and benefits of the unsupervised hyperbolic representations. Codes are available at https://github.com/junhocho/HGCAE

    Migration of Elastic Capsules by an Optical Force in a Uniform flow

    Get PDF
    AbstractThe behavior of an elastic capsule by an optical force in a uniform flow is examined by using the penalty immersed boundary method. The elastic capsule is subjected to the laser beam with Gaussian distribution in the perpendicular direction to the fluid flow. The elastic capsule migrated by the optical force along the direction of the laser beam propagation, and the migration distance is dependent on its properties. The oblate capsule with b/a = 0.5 obeying the neo-Hookean constitutive law is first considered, and the effects of the surface Young's modulus and the initial inclination angle on the migration distance are studied. The migration distance of the oblate capsule is increased as the surface Young's modulus increases, and the non-inclined oblate capsule is more migrated than the differently inclined capsules. Then the spherical, oblate, and biconcave capsules obeying the Skalak constitutive law are considered. A comparison of the trajectories of the capsules indicates that the migration of the spherical capsule is the largest. Unlike the oblate capsule, the non-inclined biconcave capsule is less migrated than other inclination angles due to its initial shape

    Contrastive Vicinal Space for Unsupervised Domain Adaptation

    Full text link
    Recent unsupervised domain adaptation methods have utilized vicinal space between the source and target domains. However, the equilibrium collapse of labels, a problem where the source labels are dominant over the target labels in the predictions of vicinal instances, has never been addressed. In this paper, we propose an instance-wise minimax strategy that minimizes the entropy of high uncertainty instances in the vicinal space to tackle the stated problem. We divide the vicinal space into two subspaces through the solution of the minimax problem: contrastive space and consensus space. In the contrastive space, inter-domain discrepancy is mitigated by constraining instances to have contrastive views and labels, and the consensus space reduces the confusion between intra-domain categories. The effectiveness of our method is demonstrated on public benchmarks, including Office-31, Office-Home, and VisDA-C, achieving state-of-the-art performances. We further show that our method outperforms the current state-of-the-art methods on PACS, which indicates that our instance-wise approach works well for multi-source domain adaptation as well. Code is available at https://github.com/NaJaeMin92/CoVi.Comment: 10 pages, 7 figures, 5 table

    High-Fidelity Eye Animatable Neural Radiance Fields for Human Face

    Full text link
    Face rendering using neural radiance fields (NeRF) is a rapidly developing research area in computer vision. While recent methods primarily focus on controlling facial attributes such as identity and expression, they often overlook the crucial aspect of modeling eyeball rotation, which holds importance for various downstream tasks. In this paper, we aim to learn a face NeRF model that is sensitive to eye movements from multi-view images. We address two key challenges in eye-aware face NeRF learning: how to effectively capture eyeball rotation for training and how to construct a manifold for representing eyeball rotation. To accomplish this, we first fit FLAME, a well-established parametric face model, to the multi-view images considering multi-view consistency. Subsequently, we introduce a new Dynamic Eye-aware NeRF (DeNeRF). DeNeRF transforms 3D points from different views into a canonical space to learn a unified face NeRF model. We design an eye deformation field for the transformation, including rigid transformation, e.g., eyeball rotation, and non-rigid transformation. Through experiments conducted on the ETH-XGaze dataset, we demonstrate that our model is capable of generating high-fidelity images with accurate eyeball rotation and non-rigid periocular deformation, even under novel viewing angles. Furthermore, we show that utilizing the rendered images can effectively enhance gaze estimation performance.Comment: Under revie

    Self-Supervised Visual Learning by Variable Playback Speeds Prediction of a Video

    Full text link
    We propose a self-supervised visual learning method by predicting the variable playback speeds of a video. Without semantic labels, we learn the spatio-temporal visual representation of the video by leveraging the variations in the visual appearance according to different playback speeds under the assumption of temporal coherence. To learn the spatio-temporal visual variations in the entire video, we have not only predicted a single playback speed but also generated clips of various playback speeds and directions with randomized starting points. Hence the visual representation can be successfully learned from the meta information (playback speeds and directions) of the video. We also propose a new layer dependable temporal group normalization method that can be applied to 3D convolutional networks to improve the representation learning performance where we divide the temporal features into several groups and normalize each one using the different corresponding parameters. We validate the effectiveness of our method by fine-tuning it to the action recognition and video retrieval tasks on UCF-101 and HMDB-51.Comment: Accepted by IEEE Access on May 19, 202
    • …
    corecore